Harvesting Application Information for Industry-Scale Relational Schema Matching
نویسندگان
چکیده
Consider the problem of migrating a company’s CRM or ERP database from one application to another, or integrating two such databases as a result of a merger. This problem requires matching two large relational schemas with hundreds and sometimes thousands of fields. Further, the correct match is likely complex: rather than a simple one-to-one alignment, some fields in the source database may map to multiple fields in the target database, and others may have no equivalent fields in the target database. Despite major advances in schema matching, fully automated solutions to large relational schema matching problems are still elusive. This paper focuses on improving the accuracy of automated large relational schema matching. Our key insight is the observation that modern database applications have a rich user interface that typically exhibits more consistency across applications than the underlying schemas. We associate UI widgets in the application with the underlying database fields on which they operate and demonstrate that this association delivers new information useful for matching large and complex relational schemas. Additionally, we show how to formalize the schema matching problem as a quadratic program, and solve it efficiently using standard optimization and machine learning techniques. We evaluate our approach on real-world CRM applications with hundreds of fields and show that it improves the accuracy by a factor of 2-4x.
منابع مشابه
New Challenges in Data Integration: Large Scale Automatic Schema Matching
Today schema matching is a basic problem in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability....
متن کاملNew Challenges : Large Scale Automatic Semantic Integration
Today schema matching is a basic problem in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability....
متن کاملAn Approach for Matching Schemas of Heterogeneous Relational Databases
AbstrAct: Schema matching is a basic problem in many database application domains, such as data integration. The problem of schema matching can be formulated as follows, " given two schemas, S i and S j , find the most plausible correspondences between the elements of S i and S j , exploiting all available information, such as the schemas, instance data, and auxiliary sources " [24]. Given the ...
متن کاملEITH - A Unifying Representation for Database Schema and Application Code in Enterprise Knowledge Extraction
The integration of heterogeneous legacy databases requires understanding of database structure and content. We previously developed a theoretical and software infrastructure to support the extraction of schema and business rule information from legacy sources, combining database reverse engineering with semantic analysis of associated application code (DRE/SA). In this paper, we present a compa...
متن کاملSchema Matching Bibtex
The proposed matching system aims to discover in an automatic way, the correspondence links A survey of approaches to automatic schema matching. Comparing SSD-placement strategies to scale a database-in-the-cloud. Generic Schema Matching, Ten Years Later. Corpus-based Schema Matching. Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, as for outputing them in appropriat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013